Automatically authoring video compositions using video cliplets

ABSTRACT

A system and a method for automatically authoring video compositions from longer units of digitized video (or a source video) by using short segments of video (or video “cliplets”). The video composition authoring process provides an aesthetically-pleasing layout of data elements to create a video composition. The data elements include multimedia elements, parameter information and description information. Any data elements that are missing but required are automatically selected by the system. The user may then review preliminary video composition results and refine the results if desired. The video composition authoring system includes an element selection and layout module for selecting the data elements in the video composition, and an iterative refinement module that allows the user to change and refine the preliminary results.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation application of U.S. Ser. No.10/177,460, entitled “System and Method for Automatically AuthoringVideo Compositions Using Video Cliplets”, filed on Jun. 19, 2002, nowU.S. Pat. No. 7,222,300 issued on May 22, 2007, whereby the entirecontents of this document is hereby incorporated by reference.

BACKGROUND

Video cameras (or camcorders) are devices that are popular with amateurvideographers for home use. Video cameras may be a digital camera, whichstores digital video on a memory device, or an analog video camera,which stores video footage on magnetic videotape. Video footage capturedby an analog video camera may be converted into digitized format usingwell-known techniques. This digital video may be processed using asoftware running on a computing devices (such as personal computers) toedit and manipulate the data captured by video cameras.

The traditional home digital video paradigm expects a user to shoot goodvideo, perform tedious video editing, and then output a single largevideo containing the edited movie. One problem, however, with thisparadigm is that raw video footage, even when professionallyphotographed, is difficult and tedious to edit. Professional editorswith professional training and using high-end editing tools can takehour to edit raw video into a final version that is just minutes induration. Moreover, most raw video footage is boring and poring overhours of raw video is quite a tedious task, especially for an amateur.

Yet another problem is that current video editing software for amateuruse is modeled after professional editing systems. This tends to makethe software difficult for the average consumer to use. User interfacesof current video editing software typically provide a user with one viewof the raw video footage. A timeline is placed along side the footage togive the user temporal orientation. The timeline may include severaldifferent “tracks”, such as a video 1 track, a video 2 track, an audio 1track, and so forth. The user interface includes controls similar to aVCR, such as play, fast-forward and rewind buttons. Using these buttons,a user browses the video footage by moving back and forth across thefootage using the controls. This process of browsing the video footageis called “scrubbing”.

Still another problem is that current video editing software assumesthat the output is yet another video, in which simple playback is theintended mode of viewing. Users may, however, wish to create othercompositions with their video. These compositions include atwo-dimensional layout of video that is analogous to a photo album, anda hyperlinked document in which the user chooses how to navigate thecomposition interactively.

Nevertheless, an amateur videographer often desires to produce nice,shorter video compositions of their longer, unedited raw video footage.The video composition may be, for example, a “highlights” video thatcontains the most interesting segments of the raw video footage, or a 2Dvideo album, or an interactive hypertext document. However, the editingprocess of scrubbing the video to determine the location of cuts in thevideo footage is a tedious, repetitive and time-consuming task and mustbe performed manually. Thus, for the average consumer the process ofediting video to produce a video composition is a difficult andburdensome task.

SUMMARY

This Summary is provided to introduce a selection of concepts in asimplified form that are further described below in the DetailedDescription. This Summary is not intended to identify key features oressential features of the claimed subject matter, nor is it intended tobe used to limit the scope of the claimed subject matter.

The video composition authoring system and method described herein iscapable of authoring video compositions that contain video cliplets. Ingeneral, a video cliplet (or “cliplet”) is an ultra-short segment ofdigital video created by dividing up longer units of video (or a sourcevideo). Typically, a video cliplet is expected to be approximatelybetween five and ten seconds in duration, but may be any length inpractice. The video composition includes a two-dimensional arrangementor collage of multimedia elements such as music, text, and photographsand also includes a single viewing window where cliplets are chainedtogether and played consecutively (such as a highlight video of avideo).

The video composition authoring system of the invention overcomes theproblems of the traditional home video editing software by usingcliplets—video of very short duration—as the main unit of manipulation,rather than a large source video. Authoring of video composition allowsa user to interactively author a video composition from a source videowhile alleviating the tedious editing process. Editing with clipletsmeans that the pieces are pre-cut, and that manipulation of video isperformed on short segments of video instead of long, tedious stretchesof video. In addition, a user can have as much or as little interactionwith the system as desired. Any information not provided interactivelyby the user is intelligently provided by the system.

In general, the video composition authoring system method provides anaesthetically-pleasing layout of data elements to create a videocomposition. The data elements include multimedia elements, parameterinformation and description information. By way of example, multimediaelements include a set of video cliplets, video, background music,background photographs, clip art, text descriptions, titles, and soforth. Moreover, parameter information describes parameters of the videocomposition and includes, for example, the size of the composition andthe duration of each video within the video composition. In addition,the description information includes, for example, time and locationinformation about the cliplets or video and a description of the desiredstyle or mood of the composition.

Any data elements that are missing but needed (as determined by thesystem) are automatically selected by the system. If a user specifiesless than all the necessary data elements needed to author a videocomposition the system automatically chooses the missing elements. Fromthese selections, preliminary video composition results are generatedfor user review. If the user likes the preliminary video compositionresults the preliminary video composition results become the outputvideo composition. Otherwise, the user is allowed to make refinementsand changes to change any portion of the preliminary video compositionresults until the user is satisfied.

The video composition authoring system includes an element selection andlayout module for selecting the data elements and designing the layoutof the elements in the video composition. The element selection andlayout module uses those data elements selected by a user in the userinput. If there is no user input, then the element selection and layoutmodule automatically selects data elements to use. Any data elementsthat are needed to complete the video composition but were not specifiedby the user are automatically selected by the element selection andlayout module to complement or supplement the user's choices. Theelement selection and layout module allows functionality and interactivewith the data elements using, for example, cliplet interest ratings andusage statistics.

The video composition authoring system also includes an iterativerefinement module for presenting preliminary video compositions resultsto the user and allowing the user to change and refine any elements asdesired by the user. The iterative refinement module allows the user tochange parameters, manually drag and drop different cliplets into thecomposition, add text bubbles to the cliplets, and crop certain clipletsusing tracking algorithms to track moving objects. The changes are madeby the iterative refinement module and updated preliminary videocomposition results containing the refinements are presented to theuser. Once the user is satisfied with the current preliminary videocomposition results, the authoring process is finished and a finishedvideo composition sent as output. The output form of the finished videocomposition can be, for example, a two-dimensional collage, a singlemovie, or a photoalbum-style hypertext “book”. The output form can beselected by the user or, in the absence of a user choice, beautomatically selected by the system.

It should be noted that alternative embodiments are possible, and thatsteps and elements discussed herein may be changed, added, oreliminated, depending on the particular embodiment. These alternativeembodiments include alternative steps and alternative elements that maybe used, and structural changes that may be made, without departing fromthe scope of the invention.

DRAWINGS DESCRIPTION

Referring now to the drawings in which like reference numbers representcorresponding parts throughout:

FIG. 1 is a diagram illustrating the concept of the video cliplet inrelation to a longer unit of video and video frames.

FIG. 2 is a block diagram illustrating an overview of the videocomposition authoring system disclosed herein.

FIG. 3 is a block diagram illustrating a computing apparatus suitablefor carrying out the invention.

FIG. 4 is a general flow diagram illustrating the operation of the videocomposition authoring system shown in FIG. 2.

FIG. 5 is a detailed flow diagram illustrating the operational detailsof the element selection and layout module shown in FIG. 2.

FIG. 6 is a detailed flow diagram illustrating the operational detailsof the iterative refinement module shown in FIG. 2.

FIG. 7 is a computer user interface illustrating a working example ofthe output of the video composition authoring system.

DETAILED DESCRIPTION

In the following description of the video composition authoring systemand method, reference is made to the accompanying drawings, which form apart thereof, and in which is shown by way of illustration a specificexample whereby the video composition authoring system and method may bepracticed. It is to be understood that other embodiments may be utilizedand structural changes may be made without departing from the scope ofthe claimed subject matter.

I. Introduction to Video Cliplets

The video composition authoring system and method described herein iscapable of authoring video compositions that contain video cliplets. Ingeneral, a video cliplet (or “cliplet”) is an ultra-short segment ofdigital video created by cutting up longer units of video (or a sourcevideo). The duration of the cliplet is restricted by a hard or softconstraint that is determined manually by a user or automatically.Typically, a video cliplet is expected to be approximately between fiveand ten seconds in duration, but may be any length in practice.

The idea of cliplets is that meaningful and short segments of video areextracted from longer units of video with only secondary regard for whatare traditionally considered shot boundaries. Cliplets, therefore, canbe based on other non-traditional cues such as audio cues (such astrying to detect sound bites) or video cues (such as trying to detectzoomed-in close ups). In addition, cliplets can overlap. Cliplets maynot cover all of the entire source video. This means that a reallyboring and uninteresting section of the source video may be excludedaltogether. All of this achieves the goal of having each cliplet be asemantically meaningful portion of video.

The following features distinguish a cliplet from other segments ofvideo. First, prior to generation a duration constraint (i.e., aconstraint on the cliplet length) is determined. This constraint maytake the form of hard upper and lower bounds, or it may be a softconstraint that takes into account other factors, such as averagecliplet length over the source video, frequency of sub-shot boundaries,variance in cliplet length, local features of the audio or video, and soforth. Second, a cliplet does not necessary need to be an independentvideo. The cliplet could be a pair of starting and stopping pointsdenoting where to cut a large video for extract the cliplet, or anyother representation of a subsequence of video. Third, a cliplet is asemantically meaningful portion of video containing what a viewer mightconsider a single short event (such as a sound bite). The cliplet has asingle theme or a common thread that makes the cliplet stand apart fromthe source video.

The relatively short length of a cliplet as compared to the source videoallows the cliplet to be manipulated more like a digital photographrather than digital video. Video cliplets allow a shift from away fromlarge videos that are burdensome to manipulate and store. Cliplets focuson short, exciting segments of video rather than on long, dull videos.Consumers tend to become bored watching hours of a long video thatcontains only a few interesting scenes. Rather than constantly using thefast-forward button, cliplets allow consumers to extract the interestingscenes, the “heart” of the long source video.

Cliplets also are easier than large source videos to manipulate andstore. User resistance to uploading and sharing videos due to theirlarge size is minimized by generating cliplets from the source video.Cliplets avoid multi-megabyte or multi-gigabyte source videos. Bydefinition, cliplets are smaller than a source video. Thus, operationsthat are impractical on a source video due to limited memory, storage,processing power, bandwidth or human attention can be performed withease on cliplets. Because of its smaller size, a cliplet has a shorterupload time, makes fewer demands on bandwidth, requires less disk spaceand generally is easier to manage than a large source video.

Most operations that apply to a digital photograph have an analog forvideo cliplets. Because of its small size, a video cliplet can bebrowsed using thumbnails, organized by time stamp and gross pixelstatistics, cut and pasted into documents, and sent easily over e-mail.In theory, most of these operations already exist for videos but inpractice the capabilities are rarely used by consumers because typicalhome videos are too large, too long and too boring. Image processing andcomputer vision algorithms that are unable to process large sourcevideos can be used easily on cliplets. Technologies such as imagestabilization, color correction, panorama creation, three-dimensionaldepth understanding, face recognition, person tracking can be used oncliplet in real time.

FIG. 1 is a diagram illustrating the concept of the video cliplet inrelation to a source video and video frames. A digital source video 100of length or duration T contains a plurality of video frames 105. Asshown in FIG. 1, the digital source video 100 is divided into aplurality of cliplets C(1) to C(N). These cliplets can be of varyinglengths.

]As explained above, each of these cliplets, C(1) to C(N), is asemantically meaningful portion of the source video 100. In some cases,two or more cliplets can overlap in time and thus share the same videoframes. Referring to FIG. 1, cliplet C(4) has a length T(4) and clipletC(5) has a length T(5). Even though T(4) is less than T(5), clipletsC(4) and C(5) overlap in time. In addition, cliplets C(4) and C(5) sharethe video frames shown by reference numeral 110.

II. General Overview

The video composition authoring system and method has the capability towork with video cliplets. However, it should be noted that videocliplets are not necessary. The input to the video composition authoringsystem includes data elements such as, for example, a set of cliplets,video files, a set of directories containing cliplets, and links tocliplets. The output of the system is a aesthetically-pleasing layout orcomposition that may contain cliplets and other multimedia elements suchas music, text and photographs. The video composition output can bethought of as a collage of multimedia elements that are brought togetherfor the purpose of creating the composition. The word “collage” is meantto suggest that smaller elements are pieced together, in time, in space,or both, to create a larger composition. By way of example, atwo-dimensional layout of cliplets displayed on a screen is a collage,and so is a single movie composed of several cliplets playing on thescreen one after another.

FIG. 2 is a block diagram illustrating an overview of the videocomposition authoring system 200 disclosed herein. In general, thesystem 200 inputs initial information and outputs a video compositioncontaining multimedia elements. Specifically, the video compositionauthoring system 200 inputs data elements 210. The data elements includemultimedia elements, parameter information and description information.By way of example, multimedia elements include a set of video cliplets,video, background music, background photographs, clip art, textdescriptions, titles, and so forth. Moreover, parameter informationdescribe parameters of the video composition and includes, for example,the size of the composition and the duration of each video within thevideo composition. In addition, the description information includes,for example, time and location information about the cliplets or videoand a description of the desired style or mood of the composition. Auser input 220 can be used to select all, none, or any amount between ofthe data elements 210 to use as input. This user input 220 is anoptional process, as shown by the dashed line. If no user input 220 isreceived, the system 200 automatically selects the data elements 210.

The video composition authoring system 200 includes an element selectionand layout module 230 for selecting the data elements and designing thelayout of the elements 210 in the video composition. The elementselection and layout module 230 uses those data elements 210 selected bya user in the user input 220. If there is no user input 220, then theelement selection and layout module 230 automatically selects dataelements 210 to use. If there is user input 220 the element selectionand layout module 230 uses those data elements 210 as selected by theuser. Any data elements 210 that are needed to complete the videocomposition but were not specified by the user are automaticallyselected by the element selection and layout module 230 to complement orsupplement the user's choices.

Depending on the user's choice of output (or automatically selected ifthere is no user input 220 available), the element selection and layoutmodule 230 applies an automatic layout algorithm to layout the selecteddata elements 210. Automatic layout of the data elements 210 occurs inan aesthetically-pleasing manner while respecting any constraints andrequests specified by the user in the user input 220. These constraintsmay be explicit as specified by the user or implicit based on hardwarelimitations (such as the viewing size of a monitor). Output from theelement selection and layout module 230 are preliminary videocomposition results 240. At this point, the element selection and layoutmodule 230 has generated a preliminary video composition containingselected data elements and in a preliminary layout.

The video composition authoring system 200 includes an iterativerefinement module 250 for presenting the preliminary video compositionresults 240 to the user and allowing the user to change and refine anyelements the user does not like. In particular, the iterative refinementmodule 250 present the preliminary video composition results 240 to theuser for a user review 260. The user review 260 is an optional process,as shown by the dashed lines. If no user review 260 occurs, then thevideo composition authoring system 200 outputs the preliminary videocomposition results 240 as a final output.

During the user review 260, the user can view the preliminary videocomposition results 240 and determine the portions that are unacceptableto him. For example, the iterative refinement module 250 allows the userto change parameters, manually drag and drop different cliplets into thecomposition, add text bubbles to the cliplets, and crop certain clipletsusing tracking algorithms to track moving objects. The changes are madeby the iterative refinement module 250 and updated preliminary videocomposition results 240 containing the refinements are presented to theuser. Once the user is satisfied with the current preliminary videocomposition results 240, the authoring process is finished and afinished video composition 270 is outputted. The output form of thefinished video composition 270 can be, for example, a two-dimensionalcollage, a single movie, or a photoalbum-style hypertext “book”. Theoutput form can be selected by the user or, in the absence of a userchoice, be automatically selected by the system 200.

III. Exemplary Operating Environment

The video composition authoring system 200 is designed to operate in acomputing environment. The follow discussion is intended to provide abrief, general description of a suitable computing environment in whichthe invention may be implemented.

FIG. 3 is a block diagram illustrating a computing apparatus suitablefor carrying out the invention. Although not required, the inventionwill be described in the general context of computer-executableinstructions, such as program modules, being executed by a computer.Generally, program modules include routines, programs, objects,components, data structures, etc. that perform particular tasks orimplement particular abstract data types. Moreover, those skilled in theart will appreciate that the invention may be practiced with a varietyof computer system configurations, including personal computers, servercomputers, hand-held devices, multiprocessor systems,microprocessor-based or programmable consumer electronics, network PCs,minicomputers, mainframe computers, and the like. The invention may alsobe practiced in distributed computing environments where tasks areperformed by remote processing devices that are linked through acommunications network. In a distributed computing environment, programmodules may be located on both local and remote computer storage mediaincluding memory storage devices.

With reference to FIG. 3, an exemplary system for implementing theinvention includes a general-purpose computing device 300. Inparticular, the computing device 300 includes the processing unit 302, asystem memory 304, and a system bus 306 that couples various systemcomponents including the system memory 304 to the processing unit 302.The system bus 306 may be any of several types of bus structuresincluding a memory bus or memory controller, a peripheral bus, and alocal bus using any of a variety of bus architectures. The system memoryincludes read only memory (ROM) 310 and random access memory (RAM) 312.A basic input/output system (BIOS) 314, containing the basic routinesthat help to transfer information between elements within the computingdevice 300, such as during start-up, is stored in ROM 310. The computingdevice 300 further includes a hard disk drive 316 for reading from andwriting to a hard disk, not shown, a magnetic disk drive 318 for readingfrom or writing to a removable magnetic disk 320, and an optical diskdrive 322 for reading from or writing to a removable optical disk 324such as a CD-ROM or other optical media. The hard disk drive 316,magnetic disk drive 328 and optical disk drive 322 are connected to thesystem bus 306 by a hard disk drive interface 326, a magnetic disk driveinterface 328 and an optical disk drive interface 330, respectively. Thedrives and their associated computer-readable media provide nonvolatilestorage of computer readable instructions, data structures, programmodules and other data for the computing device 300.

Although the exemplary environment described herein employs a hard disk,a removable magnetic disk 320 and a removable optical disk 324, itshould be appreciated by those skilled in the art that other types ofcomputer readable media that can store data that is accessible by acomputer, such as magnetic cassettes, flash memory cards, digital videodisks, Bernoulli cartridges, random access memories (RAMs), read-onlymemories (ROMs), and the like, may also be used in the exemplaryoperating environment.

A number of program modules may be stored on the hard disk, magneticdisk 320, optical disk 324, ROM 310 or RAM 312, including an operatingsystem 332, one or more application programs 334, other program modules336 (such as the video composition authoring system 200) and programdata 338. A user (not shown) may enter commands and information into thecomputing device 300 through input devices such as a keyboard 340 and apointing device 342. In addition, a camera 343 (such as a video camera)may be connected to the computing device 300 as well as other inputdevices (not shown) including, for example, a microphone, joystick, gamepad, satellite dish, scanner, or the like. These other input devices areoften connected to the processing unit 302 through a serial portinterface 344 that is coupled to the system bus 306, but may beconnected by other interfaces, such as a parallel port, a game port or auniversal serial bus (USB). The monitor 346 (or other type of displaydevice) is also connected to the system bus 306 via an interface, suchas a video adapter 348. In addition to the monitor 346, computingdevices such as personal computers typically include other peripheraloutput devices (not shown), such as speakers and printers.

The computing device 300 may operate in a networked environment usinglogical connections to one or more remote computers, such as a remotecomputer 350. The remote computer 350 may be another personal computer,a server, a router, a network PC, a peer device or other common networknode, and typically includes many or all of the elements described aboverelative to the computing device 300, although only a memory storagedevice 352 has been illustrated in FIG. 3. The logical connectionsdepicted in FIG. 3 include a local area network (LAN) 354 and a widearea network (WAN) 356. Such networking environments are commonplace inoffices, enterprise-wide computer networks, intranets and the Internet.

When used in a LAN networking environment, the computing device 300 isconnected to the local network 354 through a network interface oradapter 358. When used in a WAN networking environment, the computingdevice 300 typically includes a modem 360 or other means forestablishing communications over the wide area network 356, such as theInternet. The modem 360, which may be internal or external, is connectedto the system bus 306 via the serial port interface 344. In a networkedenvironment, program modules depicted relative to the computing device300, or portions thereof, may be stored in the remote memory storagedevice 352. It will be appreciated that the network connections shownare exemplary and other means of establishing a communications linkbetween the computers may be used.

IV. Operational Overview and Details

FIG. 4 is a general flow diagram illustrating the operation of the videocomposition authoring system 200 shown in FIG. 2. In general, the videocomposition authoring system 200 provides an aesthetically-pleasinglayout of data elements, including multimedia elements (such as video,cliplet, and sound). In particular, the video composition authoringsystem 200 operates by inputting data elements (box 400). These dataelements include multimedia elements, parameter information anddescription information as described above.

Any data elements that are missing but needed (as determined by thesystem 200) are automatically selected (box 410). Thus, if a userspecifies less than all the necessary data elements needed to author avideo composition, then the missing data elements are automaticallychosen by the system 200. Next, preliminary video composition resultsare generated for user review (box 420). The user is allowed to makesrefinements and changes to change any portion of the preliminary videocomposition results (box 430).

Element Selection and Layout Module

FIG. 5 is a detailed flow diagram illustrating the operational detailsof the element selection and layout module 230 shown in FIG. 2. Theelement selection and layout module 230 selects data elements anddesigns the layout of the elements with a video composition. Selectionof the data elements occurs by user-specified instructions,automatically-generated selection by the module 230, or a combination ofboth.

The operation of the element selection and layout module 230 starts (box500) by determining whether a user wants to select data elements (box510). The user has the capability to select all of the data elements,none of the data elements, or a combination of user-selection andautomatic-selection. If a user decides to select data elements, theselected data elements are inputted to the module 230 (box 520).Otherwise, the operation skips inputting user selected data element.

Next, the module 230 automatically selects and obtains any missing dataelement that is needed but was not specified or selected by the user(box 530). The module 230 may automatically select all, none, or anyamount in between of the necessary data elements, depending on theamount of user input. Once the data elements are selected adetermination is made whether the user wants to specify the layout ofthe data elements (box 540). If so, then the specified element layout isreceived as input from the user (box 550). Otherwise, this process isskipped.

The layout of elements not having a layout specified by a user then isautomatically performed (box 560). Once again, the number of dataelements having their layout determined automatically is a function ofhow much input the user provides. If little or no user input isprovided, then the module 230 automatically specifies the layout for allor most of the selected data elements. On the other hand, if most or allof the layout for the data elements is specified by the user, the module230 automatically determines few or none of the layout for the dataelements. Once the layout of the data elements is determined, a videocomposition is sent as output (box 570).

Functionality of the Element Selection and Layout Module

The element selection and layout module 230 contains certainfunctionality that allows a video composition to be authored. Thespecific functionality is as follows:

Cliplet Interest Ratings

If the video composition authoring system 200 uses cliplets, thecliplets may have interest ratings assigned to them based uponprocessing technologies that are available to provide information aboutthe cliplet. For example, if face detection technology is available,then each individual cliplet can be processed to detect faces. Theinformation obtained from this processing, such as whether the clipletcontains a face, is then stored with each individual cliplet. Based onthis information an interest rating in face detection then can bedetermined for each cliplet. The interest ratings are associated percliplet, rather than per video frame. Computation of the features usedin the rating process, however, may have been performed per frame, andstored for later use during the cliplet rating process.

Cliplet ratings can be based on any information relevant to andavailable for a cliplet. This cliplet rating information includes timestamps, location stamps, audio signal, video signal and all of theinformation and analyses as discussed above concerning sub-shot boundarydetection. Cliplet rating information can take advantage of whatevertechnology is available to provide information about a cliplet. Thisincludes voice recognition, speaker recognition, face detection, zoomdetection, pan detection, any type of audio analyses or recognition, andany type of video analyses or recognition. Any of these technologies maybe used to generate an interest rating for an individual cliplet. By wayof example, is the interest rating is in detecting faces, then clipletscontaining faces would have a higher interest rating than those clipletwithout faces, and among cliplets with faces, those which contain facesfacing the camera for a greater percentage of the time may be ratedhigher. As another example, if the interest rating is in close-ups, thencliplets that immediately follow a zooming event would have a higherinterest rating than other cliplets.

Cliplet interest ratings may be multi-dimensional. For example, acliplet may have a rating for “audio activity level” as well as separateratings for “visual activity level” and “occurrence of faces”. Ratingsmay be absolute numeric values or may be relative orderings (orrankings) between cliplets. By way of example, assume that a rating isto be assigned to a cliplet based on audio. This can be performed bycomputing a variance in an audio power signal, normalized over all knowncliplets. In another example of cliplet rating using vision, assume thatcamera zoom or pan is detected and higher ratings are assigned tocliplets immediately following a zoom or pan event. In yet anotherexample of cliplet rating using duration is to make a ratingproportional to a Gaussian centered on durations of x seconds, where xmight be based on user preferences or expectations.

Automatic Space Adaptation

The output of the video composition authoring system 200 includes avideo composition containing a collage of cliplets. The collage includesa plurality of windows, with a cliplet contained in each window.Depending on the size of the windows, the collage of cliplets can adaptso that the cliplets will move around and try to fit in the availablespace, while maintaining some pleasant or aesthetically-pleasing layout.This space adaptation is performed automatically by the elementselection and layout module 230.

Automatic Selection of Cliplet Collage Contents

If a user does not want to work hard, the video composition authoringsystem 200 can automatically select cliplets to populate the windows ofthe video composition collage based on user preferences or randomly.

By way of example, in the video composition authoring system 200 cancreate a plurality of windows that are blank. The windows then can befilled automatically by using one or more of the following algorithms.One algorithm for the automatic selection could be based on interestratings. For example, the top N cliplets in a category can be determinedand taken from each category and placed in each of the windowsautomatically. These categories may include, for example, interestingaudio in an audio ratings category, interesting faces in a facialratings category, and close-ups in a zoom ratings category. The criteriafor what constitutes the “top” is dependent on the category. Forexample, in the zoom category, the cliplets containing close-ups wouldbe considered the “top” in that category.

Another algorithm for automatic selection could be using the interestratings along with a time constraint. If, out of the top N cliplets in acategory, two of the cliplets are close in time, then the module 230assumes that the two cliplets are from the same scene. In order toprovide variety, one of the cliplets is chosen and the other isdiscarded.

Another algorithm is to perform a random selection from the top clipletsin each category. For example, instead of taking only top N cliplets ineach category, this algorithm designates a top M number of cliplets froma category (where M>N), and random selects N cliplets from the Mavailable cliplets. In this manner, the variety is maintained in theoutput video composition.

Usage Statistics

The element selection and layout module 230 also can select and populatethe video composition collage based on usage statistics. Usagestatistics track the frequency of cliplet usage. These results may bedisplayed to a user. Usage statistics are computed as users interactwith cliplets through the element selection and layout module 230.

Every time a user views or selects a cliplet the usage rating for thatcliplet increases. Usage statistics are a type of cliplet interestrating that alleviates the need to explicitly ask the user to specifywhat type or category of cliplet he prefers. Over time usage statisticsbecome more accurate in determining which cliplets are interesting to auser.

Usage statistics can be correlated with other cliplet interest ratings.This correlation can be used to adjust and train the interest ratingsand the cliplet rating process over time based on the usage ratings andwhat is interesting to the user. By way of example, if a user isconsistently looking at cliplets that have a lot of interesting audio,then it can be deduced that the audio interest ratings are important tothe user. Thus, it can be determined that the user prefers and isinterested in cliplets having high audio interest ratings. Bycorrelating the usage statistics with the interest ratings, over timethe system 200 “learns” the preferences of the user. This knowledge canbe used, for example, when selecting cliplets to populate windows of thevideo composition collage.

Video Composition Output Style Selection

The element selection and layout module 230 allows a user to select anoutput style of the video composition. According the style selected, themodule 230 selects and arranges data elements in accordance with theselected style. For example, if a user selected a “romantic” style, themodule 230 might select soft music, choose video elements lacking fastaction, add slow motion, and slightly blur the video elements in keepingthe romantic theme.

Iterative Refinement Module

FIG. 6 is a detailed flow diagram illustrating the operational detailsof the iterative refinement module 250 shown in FIG. 2. The iterativerefinement module 250 presents preliminary video compositions results toa user. If the user likes the results the preliminary video compositionsresults are left unchanged and sent as a video composition output. Ifthe user does not like the results, the iterative refinement module 250allows the user to make changes and refine the data elements or theirlayout that is unacceptable to the user.

The operation of the iterative refinement module 250 being (box 600) bydetermining whether a user wants to view a preliminary video compositionresults (box 610). If so, then the module 250 presents the preliminaryvideo composition results to the user (box 620). If not, then thepreliminary video composition results are considered a completed videocomposition. Next, a determination is made whether the user wants tomake changes to the preliminary video composition results (box 630). Ifthe user does want to make changes, the module 250 allows the user toinput the changes and then performs those changes specified (box 640).Then the updated preliminary video composition results are presented tothe user (box 620). This iterative process continues until the user issatisfied with the preliminary video composition results. If the userdoes not want to make changes, then the preliminary video compositionresults are sent as output as a completed video composition (box 650).

Functionality of the Iterative Refinement Module

The iterative refinement module 250 contains certain functionality thatallows a video composition to be authored. The specific functionality isas follows:

Drag and Drop Refinement

The iterative refinement module 250 provides drag and drop functionalitysuch that elements of the video composition may be modified, added, ordeleted. This drag and drop functionality allows a user to drag and dropa desired cliplet to a desired position within the video composition.

Refinement of Cliplet Boundaries

Even though the starting and ending points (or editing points) ofcliplets already are determined, a user may be unsatisfied with them.The iterative refinement module 250 allows the user to lengthen clipletsby merging a cliplet with its temporal neighbors. This is achieved byusing an input device to merge at least two of the video cliplets suchthat a first cliplet merges with a second cliplet to create a new, thirdcliplet. The third cliplet is the duration of the first and secondcliplets combined. This allows a user to lengthen and combine a clipletwith any of its temporal neighbors with requiring any scrubbing.

If the user is still unhappy with the editing points of a cliplets, theiterative refinement module 250 includes functionality that allows auser to extend or shrink either the starting or the ending points. Toavoid scrubbing, the user has the option of having the systemautomatically find one or two new editing points. Through the inputdevice, the use can request that the iterative refinement module 250present other starting and ending point possibilities for the cliplet.

Title and Credits Generation

Title and credit generation require a user to enter the cast of thevideo composition collage and who contributed to the video footage.Next, the iterative refinement module 250 generates a credits and castlist. If less user input is desired, the module 250 can use facerecognition technology to “learn” the names of people. This occurs byhaving the user enter name of person once and then the module 250automatically recognizes those people subsequently and is able toautomatically create a cast list.

Text Annotation

Text may be added to the video composition either by user input orautomatically by the system 200. Text may be used to annotate, describe,or compliment the video composition. The iterative refinement module 250provides a user the functionality to add text annotation and determinewhere on the video composition the text will be located. In addition,various text styles and sizes are available for the user to choose.

V. Working Example

The following working example is used to illustrate the operationaldetails of the invention. This working example is provided as an exampleof one way in which the video composition authoring system 200 may beimplemented. It should be noted that this working example is only oneway in which the invention may be implemented, and is provided forillustrative purposes only.

FIG. 7 is a working example of a video composition authored using thevideo composition authoring system 200. In this working example, a videocomposition 700 is a two-dimensional layout or collage of cliplets andother multimedia elements. The video composition 700 includes aplurality of windows 710, 720, 730, 740, 750, 760 and 770, arranged inan aesthetically-pleasing manner. Within each of the plurality ofwindows, 710, 720, 730, 740, 750, 760 and 770, are cliplets capable ofbeing played within each window. In addition, the video composition 700includes a title 775 describing the video composition 700. The videocomposition 700 also includes added text that enhances the contents of awindow. For example, in windows 740 and 770 text has been added thatdescribes or enhances the contents. A background picture 780 of a sunsethas been added to the video composition 700. Also included in the videocomposition is an audio file containing Hawaiian music.

The following discussion is intended to provide an example of how thevideo composition authoring system 200 might obtain the elements of avideo composition, such as the video composition 700 shown in FIG. 7.Suppose a user provides nothing but a set of cliplets that contain thedate and time and the location of where they were photographed. Thevideo composition authoring system 200 might do the following. First,the element selection and layout module 230 might select the bestcliplets from the input set of cliplets. Determining the best clipletsmay be based on cliplet ratings. Next, the a title might be deduced byusing the calendar information. For example, if the video wasphotographed in Hawaii on or around February 14^(th), the titlegenerated might be “Valentine's Day in Hawaii”.

In order to determine what other data elements should be selected, theelement selection and layout module 230 might use a natural-languagedatabase to generate key words or search terms. For example, using the“Valentine's Day in Hawaii” example, these terms may include “romantic”,“aloha”, “Kauai”, “Oahu”, “St. Valentine”, and “chocolate”. Next, asearch may be performed, either on the user's database or on theInternet, for additional data elements. For example, the elementselection and layout module 230 might search a free music Internet sitefor “Hawaii” and “romantic” and return an audio file containing Hawaiianmusic.

Similarly, a background photograph could be selected by the elementselection and layout module 230 using the search terms and the user'sown database, the Internet, or both. For example, the element selectionand layout module 230 might search the user's hard drive for abackground photograph of a sunset. In addition, a “romantic” mood mightbe selected for the video composition such that selected cliplets willtend to be longer. Moreover, the cliplets selected will tend to bebiased toward those cliplets that do not contain fast motion. In keepingwith romantic mood selection, the selected cliplets may undergo imageprocessing such as intentional blurring to soften the edges. Facedetection may be used to select cliplets that contain close-ups offaces.

The foregoing Detailed Description has been presented for the purposesof illustration and description. Many modifications and variations arepossible in light of the above teaching. It is not intended to beexhaustive or to limit the subject matter described herein to theprecise form disclosed. Although the subject matter has been describedin language specific to structural features and/or methodological acts,it is to be understood that the subject matter defined in the appendedclaims is not necessarily limited to the specific features or actsdescribed above. Rather, the specific features and acts described aboveare disclosed as example forms of implementing the claims appendedhereto.

1. A method for authoring a video composition from source video,comprising: inputting a set of data elements that includes video;determining required data elements needed to author the videocomposition; automatically selecting any of the required data elementsthat have not been previously determined; using the required dataelements to generate preliminary video composition results; andpresenting the preliminary video composition results to a user to allowuser refinement of the preliminary video composition results to producethe video composition.
 2. The method of claim 1, further comprisingdetermining whether the user wants to select any of the required dataelements and, if so, allowing the user to select any of the requireddata elements.
 3. The method of claim 1, further comprising allowing auser to specify a layout of the required data elements within the videocomposition.
 4. The method of claim 3, further comprising: determiningthat the user has not specified layout information for each requireddata element; and automatically specifying the layout information notspecified by the user.
 5. The method of claim 1, wherein the videoincludes a set of cliplets and wherein automatically selecting any ofthe required elements further comprises selecting at least some of theset of cliplets for the video composition based on cliplet interestratings.
 6. The method of claim 5, wherein cliplet interest ratingsinclude at least one of the following: (a) voice recognition; (b)speaker recognition; (c) face detection; (d) zoom detection; (e) pandetection; (f) any type of audio analyses; (g) any type of audiorecognition; (h) any type of video analyses; (i) any type of videorecognition.
 7. The method of claim 6, wherein the cliplet interestratings are correlated with cliplet usage statistics.
 8. A method forauthoring a video composition from source video using short segments ofthe source video called video cliplets, comprising: generating a set ofwindows to be contained within the video composition; automaticallyselecting a cliplet to be placed within each window of the set ofwindows so as to create the video composition based on multi-dimensionalcliplet interest ratings associated with the cliplet; presenting thevideo composition to a user for review; allowing the user to change thecliplets within each of the windows if the user if not satisfied withthe video composition after the review; and allowing a user to choose anoutput style of the video composition.
 9. The method of claim 8, whereinautomatically selecting a cliplet further comprises correlating usagestatistics with the cliplet interest ratings to increase the accuracy ofthe cliplet interest ratings.
 10. The method of claim 8, whereinautomatically selecting a cliplet further comprises: defining clipletinterest rating categories; ranking each of the cliplets in each of thecliplet interest rating categories; and selecting the top-rankedcliplets in each of the cliplet interest rating categories to be placedin each of the windows.
 11. The method of claim 10, further comprising:determining whether any of the top-ranked cliplets in each of thecliplet interest rating categories are close in time; and if so, thenchoosing one of the top-ranked cliplets that are close in time anddiscarding the other ones of the top-ranked cliplets that are close intime.
 12. The method of claim 8, wherein automatically selecting acliplet further comprises: defining cliplet interest rating categories;defining a set of M number of cliplets within each of the clipletinterest rating categories; and randomly selecting for each of thecliplet interest rating categories a set of N number of cliplets fromthe set of M number of cliplets, where M is greater than N.
 13. Themethod of claim 8, wherein allowing the user to change the clipletswithin each of the windows further comprises providing drag and dropfunctionality such that using an input device a user can perform atleast one of the following in the video composition: (a) change alocation of a cliplet; (b) modify the cliplet; (c) add a cliplet; (d)delete a cliplet.
 14. The method of claim 8, further comprising allowingthe user to add text annotation to any of the cliplets.
 15. The methodof claim 8, further comprising allowing a user to add text annotation tothe video composition.
 16. The method of claim 8, further comprisingallowing a user to add at least one of the following to the videocomposition: (a) a title for the video composition; (b) a credits list.17. A computer-readable medium having computer-executable instructionsfor authoring and generating a video composition using video clipletsgenerated from digitized video, comprising: creating a layout of thevideo composition as a two-dimensional collage of cliplet windows;allowing a user to select video cliplets to place in the clipletwindows; and automatically selecting video cliplets to place in thecliplet windows if the user does not select enough video cliplets tofill each of the cliplet windows.
 18. The computer-readable medium ofclaim 17, wherein creating a layout of the video composition furthercomprises automatically determining a layout of the two-dimensionalcollage of cliplets windows.
 19. The computer-readable medium of claim17, wherein creating a layout of the video composition further comprisesallowing a user to determine a layout of the two-dimensional collage ofcliplets windows.
 20. The computer-readable medium of claim 17, furthercomprising allowing the user to review the video composition and changethe video composition if the user desires.